Project Goals

  1. To characterize how genetic differentiation accumulates within species across a diverse clade of lizards
  2. To determine which organismal and environmental variables (if any) predict variation in the rate at which differentiation accumulate across species
  3. To test if the rate at which genetic differentiation accumulates is correlated with variation in speciation rate.

Things to do

Overview of project

Before we get started, let’s load the data that summarizes all species.

Here’s a phylogeny of the sphenomorphines (outgroups not shown). This includes OTUs only.

## [1] "RSS: 3.19829753114225e-08"

There are 298 putative OTUs in sphenomorphines; this tree samples 248 of them.

Of these OTUs, we sampled 104 for gene flow.

Note that we need to get data for Eremiascincus_douglasi_1; Eremiascincus_isolepis_1; Eremiascincus_richardsonii_2. Right now, these tips have been pasted into their species complex at the halfway point of the terminal branch length.

Across these OTUs, we sampled 1147 individuals for ddRAD data.

Patterns of Genetic Differentiation within Species

Basic Approach

The basic approach we used:

  • to sample broadly across an OTU – on average we sampled 8.5 individuals per OTU
  • to calculate Fst between all pairwise individual comparisons
  • to use these estimates to infer the rate at which genetic differentiation accumulates
  • this slope was the key “summary statistic” used in all analyses Approach used

Species by species results

We have included detailed summary reports for each OTU in Dropbox/Sphenomorphine_Gene_Flow/figures/maps.pdf. Note that I have trouble opening it in Preview so you might try Adobe

Each page shows:

  • a map with the geographic range of the OTU + the individual sampling points
  • the Fst estimates shown with respect to geographic distance and the inferred slope

    • note Fst is inverse
    • geographic distance is taken as log
    • this is as discussed in Rousset 1997
  • also shown is a histogram showing variation in slope estimates because of estimation error in Fst
    • to generate this, did 100 bootstraps, across the variance in Fst estimates between any two individuals
  • also shown for OTUs where we sampled 6 or more individuals is how jackknifing individuals influences Fst estimates

Take homes:

  • some OTUs likely have some individuals that are miscategorized to OTU
    • see Anomalopus verreauxii, for example
    • this also tends to lead to a multi-modal spread in Fst slope jackknife estimates (i.e., panel 4)
  • not that much error is introduced to slope estimates because of inaccurate estimation of Fst (I think this is because we typically are inferring Fst across many sites)
  • a decent amount of error is introduced because of sampling across range
    • it seems that the fewer individuals sampled, the higher the inferred slope
    • should control for that by including number of individuals sampled as a factor in any model fitting

Differences in approaches used to estimate genetic difference

We estimated genetic differentiation between individuals using:

  • mtDNA dxy
  • nDNA dxy
  • Fst

Note that there is no strong theoretical framework for using mtDNA dxy and nDNA dxy to look at differentiation across the landscape, so we focus on Fst here. However, if we compare pairwise estimates across all individuals, these different approaches to measuring differentiation are all pretty correlated.

But particularly in dxy mtDNA & dxy nDNA see some differences that are likely because of cytonuclear introgression.

Environmental vs. geographic distance

While not a key question in this work, much of the genetic differentiation literature these days is centered around IBD (isolation-by-distance) vs. IBE (isolation-by-environment). To help put our work in context of this, we looked at how much of the variation in pairwise-Fst approaches is explained by geographic vs. environmental distance.

This plot only shows significant results, and uses Wang (2012)’s approach to multiple-matrix regression.

Gray bars are environment, black bars show geographic distance – total length of line reflects how much of Fst variation is explained by the two measures. In general, we can explain about half of the pattern of genetic differentiation across OTUs.

Patterns of Genetic Differentiation among Species

Variation in IBD slopes

There is a decent amount of variation in slopes inferred across OTUs. This shows all slopes, including the few which were non-significant. Note that it is also not very normally distributed, so all subsequent analyses will use log(slope).

Phylogenetic Signal in IBD slopes

Another way to look at this is how these slopes change across genera. Here, showing the same results as above but for only genera in which we calculated slopes for 3 or more OTUs.

If we look at these slopes across the phylogeny, we can recapitulate what we saw in the boxplot: there is phylogenetic signal in the pattern of slope. need to add legend Indeed, the pattern of differentiation does show phylogenetic signal: lambda is 0.2985306 and the pvalue for this is 0.0125744.

What explains this variation in IBD slopes?

What explains why there is variation in IBD slopes?

Genetic differentiation (as measured here) is a function of effective population size (Ne) and migration (m). Further, arguments centered around IBD suggest that the more environmental heterogeneity across a range, the less differentiation we would expect. So, based on this, we include the following factors:

  • range size: a proxy for effective population size
  • SVL: a proxy for effective population size
  • mean pi, or within-population pi: a more direct measure of effective population size
  • shank length: these lizards have differing levels of limb loss, which (naively) we might expect would reflect with dispersal patterns – shank length reflects this
    • note that I’ve also explored toe length as a proxy for this; these are both very correlated with each other
  • elevational range: a proxy for environmental heterogeneity
  • PC1 range in climate space: a proxy for environmental heterogeneity
  • ninds: number of individuals used to infer IBD (a nuisance variable)
  • lat. midpoint: another nuisance variable

First, we looked at correlation among these factors in our model. Including factors with too much colinearity can make model fitting do weird things.

Then, we test if these factors show phylogenetic signal. Note that within population pi (mean_pi) didn’t show phylogenetic signal when measured just across Ctenotus. All our factors (but for ninds) show strong patterns of phylogenetic signal. [1] “x has no names; assuming x is in the same order as tree\(tip.label" [1] "x has no names; assuming x is in the same order as tree\)tip.label” [1] “x has no names; assuming x is in the same order as tree\(tip.label" [1] "some data in x given as 'NA', dropping corresponding species from tree" [1] "x has no names; assuming x is in the same order as tree\)tip.label” [1] “some data in x given as ‘NA’, dropping corresponding species from tree” [1] “x has no names; assuming x is in the same order as tree\(tip.label" [1] "x has no names; assuming x is in the same order as tree\)tip.label” [1] “x has no names; assuming x is in the same order as tree\(tip.label" [1] "x has no names; assuming x is in the same order as tree\)tip.label”

factor lambda pvalue
ninds 0.1761827 0.3328809
mean_pi 0.8076407 0.0000000
svl 1.0508282 0.0000000
shank 1.0872383 0.0000000
range_size 0.8722269 0.0000037
lat_midpoint 0.9682147 0.0000020
elev_range 0.9364051 0.0000000
PC1_range 0.9039572 0.0000003

Now, we do our model fitting exercise where we infer which of these variables best predict variation in slope variation across OTUs. This is the same model-averaging approach used in Singhal et al 2017 and first described by Burnham and Anderson 2002.

We took the natural log of two IV variables (ninds, range size) and the dependent variable (slope) based on visual inspection of histograms.

## Model 100 of 255 done.
## Model 200 of 255 done.

These results suggest that the most important factors explaining variation in IBD slopes are:

  • within-pop pi: as expected, as Ne increases, differentiation decreases – bigger demes experience less drift and are thus differentiate more slowly from adjoining demes
  • shank length: as expected, as an OTU has more substantial legs, differentiation decreases – a naive explanation: more leggy lizards move more and therefore, higher dispersal leads to reduced differentiation

  • this also might be a body size effect; the correlation between SVL and shank length is ~0.4
  • bigger animals are also expected to disperse more
  • though, bigger animals are also expected to have smaller population sizes
  • elevational range: this supports the IBE hypothesis that we see more differentiation with more heterogenetiy (i.e., a positive correlation)

  • I do not trust this result; see below

We can look at these significant results one by one.

Correlations between Slopes inferred using different metrics

This analysis is because I think it will be of interest to people in the community. We do not see strong correlations here, though there are correlations. Note (again) that there is no theoretical foundation for looking at mtDNA and nDNA dxy differentiation across space.

Genetic Differentiation and Speciation

Variation in speciation results

One of the classic results in this work comes from Dan’s work (Rabosky et al 2007, Rabosky et al 2014) showing that Ctenotus & Lerista have much higher rates of speciation than the rest of the clade. I am repeating these results here using our tree which includes putative new OTUs. For now, just using the diversification rate (DR) statistic used by Jetz et al 2012.

As shown earlier, we still see that Ctenotus & Lerista have higher rates of speciation than the rest of the clade.

Here, we show the phylogeny with variance in speciation rates again, showing only those tips that we sampled.

Connection between genetic differentiation and speciation?

Finally, we test for if variation in genetic differentiation can explain variation in speciation rates, using a PGLS.

We tested a model in which we took the log of the differentiation slope and one in which we did not, and in neither, does the differentiation slope significantly predict the DR speciation rate. For the untransformed data, the p-value is 0.9035116 and for the transformed data, it is 0.9913201.

Our data provide no evidence that variation in rates of genetic differentiation explain variation in speciation rates.